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Tomato and its processed products are one of the most widely consumed fruits. Its domestication, however, 
has resulted in the loss of some 95% of the genetic and chemical diversity of wild relatives. In order to 
elucidate this diversity, exploit its potential for plant breeding, as well as understand its biological 
significance, analytical approaches have been developed, alongside the production of genetic crosses of wild 
relatives with commercial varieties. In this article, we describe a multi-platform metabolomic analysis, using 
NMR, mass spectrometry and HPLC, of introgression lines of Solan um pennellii with a domesticated line in 
order to analyse and quantify alleles (QTL) responsible for metabolic traits. We have identified QTL for 
health-related antioxidant carotenoids and tocopherols, as well as molecular signatures for some 2000 
compounds. Correlation analyses have revealed intricate interactions in isoprenoid formation in the plastid 
that can be extrapolated to other crop plants. 

The cultivated tomato, Solatium lycopersicum, is one of the most widely consumed fruits. Over 150 million 
metric tons are produced annually, contributing to a $35 billion industry 1 . Many of the essential and 
beneficial nutrients in the human diet, such as antioxidants, vitamins and minerals, are derived from tomato 
fruit and its products 2 . In addition to being an economically important crop, it is also a model species, impacting 
on several areas of plant biology, such as fruit physiology and development 3 , quantitative genetics and plant 
breeding 4,5 . Over the last decade an impressive array of genetic resources for such studies has been developed, 
including populations of genetically defined inbreds, generated through crossing S. lycopersicum with wild 
relatives 6 , mutant collections 7 and TILLING (targeting induced local lesions in genomes) platforms, generating 
a diverse range of mutants 8 . Procedures for the evaluation of protein composition 9 , enzyme activities 10 and 
metabolites 11 have also been developed. These multiple levels of analysis can now be exploited, following the 
sequencing of the tomato genome 12 (http://solgenomics.net/), to enable a step-change in our ability to utilise the 
metabolic diversity and natural genetic variation of wild relatives for improving crop quality through metabo- 
lomics-assisted breeding 1315 . It has been estimated that less than 5% of the genetic variation of wild relatives is 
present in domesticated species, due to selection of preferred genotypes in existing germplasm 16 , a process known 
as domestication syndrome 17 . 

Metabolite profiling approaches have been used with several introgression (IL) populations, including that of 
the S. pennellii x S. lycopersicum, to evaluate chemical composition, identify quantitative trait loci (QTL) and 
facilitate their resolution 18 20 . In the present article, a complementary set of metabolomic approaches has been 
implemented to greatly extend existing metabolite data resources. Extracts of ripe fruit from the S. pennellii IL 
population have been analysed by NMR, positive (+ve) and negative (— ve) direct infusion mass spectrometry 
(Di-MS), high performance liquid chromatography with photodiode array detection (LC-PDA/MS) and gas 
chromatography-MS (GC-MS) and compared to the domesticated cultivar, M82. Identification of metabolite 
alleles in ILs, as well as computational integration of the datasets has been carried out and metabolic networks 
constructed for the tomato fruit metabolome and the individual pathway components. These data represent a 
valuable and timely resource to fully capitalise on the sequencing of tomato genomes. In addition, the applicability 
of tomato as a model for other crop plants means that the analytical approach developed in this study and data 
generated provide a generic resource. 

Results 

Introgression lines show significantly different metabolite profiles to M82. Principal component analysis 
(PCA) for the raw intensities of all variables yielded separation between the two seasonal crops for the ILs and 
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M82 (Supplementary Figure SI). No defined classes of metabolites 
within a crop were responsible for clustering; instead a variety of 
intermediary metabolites were identified. Re-analysis of the dataset 
following normalization to the M82 control metabolites, however, 
indicated a high degree of co-localization of the genotypes to the 
same region of the scatter plot (Supplementary Figure S2). PCA of 
metabolites from each of the 76 ILs, using score and loadings plots, 
showed separation from the M82 parent (Supplementary Figure S3, 
A-L), indicating that in each IL the chemical composition was 
significantly altered by the presence of the introgressed genomic 
regions on each chromosome from the S. pennellii wild relative. In 
all cases, the variation could not be attributed to one metabolite. 

To analyze individual metabolite perturbations throughout 
the metabolome, the putative metabolites, ascertained from the 
PlantCyc database, in combination with those confirmed through 
targeted metabolite profiling, were tabulated to show fold changes 
in molecular features relative to the M82 (Supplementary Dataset 
SI, ftp://ftp.solgenomics.net/projects/uk_metabolomics/Solanum_ 
pennellii/) and categorized into over 600 pathways (Supplementary 
Dataset S2, ftp://ftp.solgenomics.net/projects/uk_metabolomics/ 
Solanum_pennellii/). Representation in Cytoscape (http://www. 
cytoscape.org/) revealed metabolite changes exclusive to an IL, or 
those common to several ILs (Supplementary Figure S4, A-C), whilst 
chromosome maps, illustrating compounds that increase or decrease 
two fold relative to M82, are shown in Supplementary Figure S5, A- 
L. An overriding feature of the data is that no IL contains a perturba- 
tion in a single pathway or metabolite. The number of pathways 
affected within an IL can vary, as can the number of metabolites 
affected per pathway. For example, ILll-4-lhas putative metabolite 
changes in only 389 out of 662 PlantCyc pathways (59%), the fewest 
shown, whereas all other ILs show changes in over 75% of the path- 
ways. The pathways affected are often diverse, even within the same 
IL. Across the whole population the down-regulation of metabolites 
is slightly more predominant. Of the 20,309 statistically significant 
variable changes (p < 0.05), 10,377 were negative and 9,932 positive, 
but an 85% decrease in this number was found after false discovery 
rate (FDR) calculations (q < 0.05). A decrease or increase in meta- 
bolites is not exclusive to any one genotype, but across the population 
specific ILs have predominantly up or down regulation of metabo- 
lites. For example, of 321 altered variables (p < 0.05, q < 0.05) in IL 
3-2, 291 are negative, whereas IL 7-4 has 364 positive changes out of 
499 true altered variables (p < 0.05, q < 0.05). In total, 45 ILs have 
net metabolite increases, with 29 ILs showing net decreases of meta- 
bolites. Changes in metabolite levels that are common to multiple ILs 
can be observed within the data, with overlapping introgressed 
regions yielding discrete clusters of metabolites. These changes are 
not always restricted to overlapping regions, but can be found 
between ILs with no DNA regions in common. 

Introgressions on chromosomes 3, 6, 8 and 12 are associated with 
changes to carotenoids and tocopherols. ILs 3-2, 6-3, and 12-2 have 
distinct fruit colour phenotypes, due to altered carotenoid pigment 
composition (Figures 1 to 3). The underlying candidate genes for 
these QTL are phytoene synthase (Psy-1), the chromoplast specific 
lycopene P-cyclase (B-CYC) and lycopene s-cyclase (e-Lcy), located 
on chromosomes 3, 6 and 12, respectively 7 . The loadings plots for 
each of these ILs, however, compared to the M82, indicate significant 
variation in other metabolites, not just those found in the carotenoid 
pathway. For example, the metabolites in IL 3-2 contributing to the 
overall variance included sucrose, in IL 6-3 glutamate and in IL12-2 
adenosine monophosphate (Figures ID to 3D). Metabolite profiling 
of ILs 8-2-1 and 8-2 revealed an increase in a-tocopherol and decrease 
in y-tocopherol, with the overall total tocopherol levels elevated 2-fold 
(Figure 4C). To validate the identity of a and y tocopherol, and their 
amounts relative to M82, targeted LC-MS was performed. Co- 
chromatography of these compounds on several HPLC/UPLC 



systems and identical UV/Vis, MS and MS/MS spectra to that of 
the authentic standards enabled unambiguous identification of 
these tocopherols (Figure 4B). The scatter plots revealed both 
metabolites to be common to the two Chr8 ILs, whilst the physical 
map showed the levels of these two metabolites to be increased and 
decreased, respectively, in both 8-2-1 and 8-2 (Supplementary Figure 
S5H). 

Correlation networks for plastid isoprenoid formation reveal 
linkage of core, intermediary and tertiary metabolism. Correlation 
networks for metabolism occurring in ripening fruit have been built 
from the metabolomic databases. These networks have been 
constructed for individual genotypes and collectively across the 
population, using biochemical pathways in the Kyoto Encyclopedia 
of Genes and Genomes (KEGG) database (http://www.genome.jp/ 
kegg/). The strength of the networks ranged from 0.2 to 0.8, with an 
average of 0.45 (Figure 5). Positive and negative correlations occurred, 
with positive attributes predominating. The over-riding feature of the 
network is the grouping of primary metabolism in the centre of the 
network, with secondary metabolic components on the periphery, 
with just a few metabolites creating links to the core. Isoprenoid 
biosynthesis, which is responsible for the formation of vitamin E 
(tocopherols) and carotenoids, was used to demonstrate the utility 
of the dataset. The sectors of metabolism displaying strong 
associations to tocopherols and carotenoids are visualised in Figure 5. 

Discussion 

The effect on metabolite profiles of the cultivation of the same 
tomato population in two seasons is evident from the PCA, shown 
in Supplementary Figure SI. However, when the metabolite changes 
to the control (M82) line are used to normalize the data, then close 
clustering of the profiles occurs (Supplementary Figure S2). Thus, 
environmental/cultivation conditions, even in a glasshouse, cause 
changes in the chemical composition of the IL population. These 
environmental effects were minimized by normalizing the data, so 
that each variable per plant is relative to the mean of the variables 
from all control plants. Collectively, the statistical and multivariate 
analyses of the normalized data indicate that the dataset is robust and 
accurate, and the effects of genetic determinants predominates over 
environmental influences, allowing metabolite perturbations caused 
solely by the introgression of the S. pennellii genome into M82 to be 
analysed for genetic determinants. A similar approach, using field- 
grown crops and focusing on hydrophilic metabolites, has been 
reported 11 , whilst a later investigation verified the heritability of 
QTL and the validity of this approach 20 . The changes in the meta- 
bolite composition of the ILs (Figures 1-3; Supplementary Figures 
S3-5; Supplementary datasets SI and 2) may be due to a number of 
factors. These include alleles in a single gene (biosynthetic or regu- 
latory), in multiple unrelated alleles, the result of changes in a meta- 
bolite or group of metabolites affecting other pathways, or processes 
via transcriptional or post transcriptional mechanisms, including 
variation in the kinetic properties of enzymes catalysing step(s) in 
metabolic pathways. In previous studies on genetically modified 
tomato fruit, we have found that post-transcriptional controls are 
important for changes to carotenoid levels, rather than gene express- 
ion 21 . The ability of the introgressed regions to modify whole path- 
ways and processes suggests an important role for transcriptional 
regulators in the control of metabolism. The breadth and diversity 
of the changes would also suggest that regulation by small molecules 
must also not be overlooked. Although there is no correlation 
between the size of introgression and the number of metabolite 
changes occurring within it, those with the greatest number of 
changes are presumably due to regulators being present in the intro- 
gressed regions that have multiple roles. A recent report has shown 
that the regulatory gene ethylene response factor 6 (SIERF6) influ- 
ences both carotenogenesis and additional ripening phenotypes 22 . So 
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A. HPLC-UV/Vis Carotenoid profiles of M82 
and IL3-2 
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Figure 1 | Metabolite changes associated with the altered colour phenotype of IL3-2. (A) Carotenoid profiles, obtained by HPLC-UV/Vis analysis, 
recorded at 450 nm; (i), M82; (ii) IL3-2. The chromatographic peaks and UV/Vis spectra are 1-lutein, 2-B-carotene and 3-lycopene. (B) Quantitative 
changes associated with pathway components. The position of the proposed pathway block (the fruit- specific phytoene synthase- 1) is indicated. (C) 
Physical map of chromosome 3, showing the position of IL3-2 and Psy-1. (D) PCA of the metabolomic dataset for IL3-2; (i), scatter diagram of the score 
values, highlighting the separation based on variance in chemical composition between IL3-2 (brown dots) and the M82 control (red dots). The % 
contribution of each component to the variance is shown (ii) , loadings plot of variables with the identity of some of the variables annotated. Only variables 
contributing to a significant difference (p-value of < 0.05) are shown. 



called master regulators are well documented in plant secondary 
metabolism 23 . 

The large metabolomic datasets created in this study enable the 
identification of metabolites associated with important agronomic 
and consumer traits. Such comprehensive phenotyping is recognised 
for its diagnostic strength in genomics-assisted selection for crop 
improvement 24-25 . 2820 mQTL can be attributed to health-related 
traits, with 1,474 increased and 1,346 decreased levels, compared 
to M82. When used in combination with the genome sequence, the 
full potential of the present dataset will provide a direct route to 
candidate genes to test their commercial potential for improving 
the nutritional quality of crops. For example, ILs 3-2, 6-3 and 12-2 
have changes to carotenoids, whilst ILs 8-2 and 8-2-1 exhibit 
increases in a-tocopherol (Figures 1-4). It is likely that the changes 
in product/precursor ratio of tocopherols relate to levels of y-methyl 
tocopherol transferase, which is responsible for the conversion of y- 
tocopherol to a-tocopherol and located at the end-point of the bio- 
synthetic pathway (Figure 4D). The y-methyl tocopherol transferase 
gene (SGN-U5845 1 1 ) is located within the overlapping region of 8-2- 
1 and 8-2 (Fig. 4D). Allelic variation in the S. pennellii y-methyl 
transferase has been identified 26 . These data demonstrate the poten- 
tial of the dataset, in combination with the genome sequence and 
other genetic resources, to rapidly assign traits to candidate genes, 



with the recent advent of Marker2 sequence facilitating identifica- 
tion 27 . Such a route to candidate genes underlying important QTL 
also helps validate and build confidence in the large-scale metabo- 
lomic approach. 

Since the provision of essential nutrients, particularly antioxidants, 
in the human diet is a key attribute of tomato products, networks 
associated with health promoting phytochemicals were prepared and 
interrogated using the metabolite datasets (Supplementary Datasets 1 
and 2). Several important, generic features, reflecting metabolism in 
general, can be deduced: (i), precursors common to multiple path- 
ways act as important nodes in the network; (ii), the influence of 
cofactors (reductants) in metabolic pathways is important and 
appears to have been undervalued previously (iii), metabolism does 
not necessarily partition into primary and secondary types; (iv), the 
strong link between the utilisation of photosynthate and renewable 
formation and (v), the independence of the plastid-localized isopre- 
noid formation and specific classes of isoprenoid metabolism. For 
example, no strong connections were apparent between plastidial 2- 
C-methyl-D-erythritol-4-phosphate (MEP) -derived isoprenoids in 
tomato and those non-plastidic compounds formed via mevalonate. 
In ripe tomato fruit, therefore, the existence of isoprenoid precursor 
exchange to and from the plastid is unlikely, which is consistent with 
the previous experimental evidence of the independent modulation of 
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A. HPLC-UV/Vis Carotenoid profiles of M82 
and IL6-3 




B. Relative changes in the carotenoids found in IL6-3 
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Figure 2 | Metabolite changes associated with the altered colour phenotype of IL6-3. (A) Carotenoid profiles, obtained by HPLC-UV/Vis analysis, 
recorded at 450 nm; (i), M82; (ii), IL6-3. The chromatographic peaks and UV/Vis spectra are 1-lutein, 2-[3-carotene and 3-lycopene. (B) Quantitative 
changes associated with pathway components. The position of the proposed step in the pathway up-regulated (the fruit specific lycopene P-cyclase, CYC- 
IS) is indicated. (C) Physical map of chromosome 6, showing the position of IL6-3 and CYC-B. (D) PCA of the metabolomic dataset for 116-3; (i) scatter 
diagram of the score values, highlighting the separation based on variance in chemical composition between IL6-3 (brown dots) and the M82 control (red 
dots). The % contribution of each component to the variance is shown (ii), loadings plot of variables with the identity of some of the variables annotated. 
Only variables contributing to a significant difference (p-value of < 0.05) are shown. 



the two pathways 28 . Whether associations exist between earlier inter- 
mediates in core metabolism and other organelles awaits further 
experimentation, as discussed recently 29 . 

The biosynthesis of isoprenoids, such as tocopherols and carote- 
noids and the degradation of chlorophyll, also group independently 
(Figure 5). However, key intermediates that are common to these 
pathways display a high level of connectivity, suggesting they are key 
regulatory hubs. If the network data are interpreted from a predictive 
viewpoint for the design of metabolic engineering strategies, then 
geranylgeranyl diphosphate synthase (GGPP synthase), which is 
responsible for the biosynthesis of common C 2 o precursors used in 
carotenoid and tocopherol formation and isopentenyl diphosphate 
isomerase/dimethylallyl diphosphate isomerase (IPP/DMAPP iso- 
merase), catalysing the synthesis of the universal C 5 building blocks 
for all isoprenoids, would be important targets. In bacteria GGPP 
synthase(s) and IPP/DMAPP isomerase(s) have been amplified with 
notable increases in end-products 30,31 . Under stress conditions, IPP/ 
DMAPP isomerase has shown positive effects on carotenoid biosyn- 
thesis 32 . These findings are supported by flux coefficients for these 
enzymes in ripening fruit 28 . In contrast, phytoene synthase has a high 
flux coefficient and numerous transgenic experiments have demon- 
strated the influence this biochemical step can have over the 



carotenoid pathway 33 . Interestingly, the correlation network places 
phytoene and its products at the edge of the network, with very few 
connections (Figure 5). Several other metabolites associated with 
biochemical steps amenable to metabolic engineering predominate 
in the same area, with few connections, such as (3-carotene and y- 
tocopherol, both of which have been used as precursors for enhan- 
cing high levels of valuable products in plant hosts 34-35 . Although the 
correlation networks suggest several common pathway precursors as 
metabolic hubs, this cannot be readily validated experimentally, 
because of the effect of regulatory processes. For example, GGPP 
synthase and IPP isomerase are members of multigene families 36 , 
where redundancy could occur in order to safeguard the essential 
nature of these biosynthetic steps through compensatory isoen- 
zymes. To a degree, this highly tuned regulation can be observed 
from the way that carotenoid content can be elevated readily in plant 
tissues where their formation is not endogenous, compared to tissues 
with optimised synthesis and sequestration has evolved 37 . 

From network analysis, the individual isoprenoid pathways are 
connected via sequential prenyl lipid and methylerythritol phosphate 
(MEP) predominating clusters to the Calvin-Benson cycle at the core 
(Figure 5). Traditionally, the terms primary and secondary metabol- 
ism are used to describe different types of cellular metabolism. The 
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Figure 3 | Metabolite changes associated with the altered colour phenotype of IL12-2. (A) Carotenoid profiles, obtained by HPLC-UV/Vis analysis, 
recorded at 450 nm; (i), M82; (ii), IL12-2. The chromatographic peaks and UV/Vis spectra are 1-lutein, 2-f3-carotene and 3-lycopene. (B) Quantitative 
changes associated with pathway components. The position of the proposed up-regulation in the pathway (lycopene ^-cyclase, Lyc-E) is indicated. (C) 
Physical map of chromosome 12, showing the position of 1112-2 and Lyc-E. (D) PCA of the metabolomic dataset for IL12-2; (i), scatter diagram of the 
score values, highlighting the separation based on variance in chemical composition between IL12-2 (brown dots) and the M82 control (red dots). 



metabolic networks constructed in the present study, however, do 
not fit into this classical format. For isoprenoid biosynthesis in 
tomato fruit, the Calvin-Benson cycle is positioned at the core and 
associated via intermediary modules, represented by the MEP and 
prenyl lipid pathways, to a tertiary level, which includes tocopherol 
and carotenoid biosynthesis. Thus, three distinct levels: core, inter- 
mediary and tertiary occur. The Calvin-Benson cycle is at the core of 
the network in ripening fruit, yet this process is typically associated 
with photosynthetic tissues. Although tomato fruit are photosynthe- 
tically active during fruit development 38 , it is not a key aspect of their 
metabolism. Recently, the role of photosynthesis in fruit develop- 
ment has been questioned, with the belief that photosynthate is 
derived from vegetative material 39 . The integrative characterisation 
of transgenic tomato plants possessing fruit specific down regulation 
of DETIOLATEDl 40 is another example where core metabolism, 
particularly carbon fixation, appears to be the progenitor of down- 
stream secondary, or in this case, tertiary pathways in tomato fruit. 
The metabolite of the Calvin cycle displaying the greatest connection 
was D- glyceraldehyde-3-phosphate, which is common to both the 
Calvin cycle and the MEP pathway. Collectively, these findings sug- 
gest that the plasticity of Calvin cycle components in tomato fruit 
warrant further investigation and the utilization of photosynthate 
carbon close to its source, without further resource allocation, could 
be a valuable strategy to explore as part of a combinatory engineering 
approach. Other metabolic links between isoprenoid formation and 



photosynthetic carbon flow have been revealed through metabolo- 
mic analysis of hemiterpenoid glycosides under nutrient depriva- 
tion 41 . Finally, cofactors/reductants appear to have an important 
influence over multiple pathways, presumably by direct action on 
biosynthetic enzymes, or perhaps via the modulation of redox state, 
as suggested for cross talk between cytosolic and plastidial isoprenoid 
formation 42 , as well as the dual role of plastidial terminal oxidase in 
carotenoid formation and chlororespiration 43 . 

In summary, the data presented in this investigation represent one 
of the most comprehensive metabolomic studies performed in planta 
and have increased our understanding of the structure and dynamics 
of the isoprenoid pathway network, a necessity for biotechnological 
applications 44 . In combination with the recently published potato 
and tomato genome sequences, a valuable resource has been created 
to associate traits to metabolites, and metabolites to candidate genes 
and regions of multiple QTL. We envisage that the resource will be 
used to extract leads for functional genomics and that the data will 
impact directly on plant breeding strategies, help elucidate under- 
lying molecular mechanisms associated with traits and provide 
exploitable datasets for systems biology approaches. 

Methods 

Cultivation of tomato crops. The S. lycopersicum (M82) x S. pennellii IL population, 
obtained from the Tomato Genetics Resource Center (http://tgrc.ucdavis.edu/ 
pennellii_ils.aspx), was grown in glasshouse conditions (16 h day length, day temp 
20"C, night temp 18°C) over two seasons. Plants were grown in 7.5 litre pots of 
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A. Significant metabolite changes associated with Chr 8 ILs 



B. Validation of tocopherol associated molecular features 
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Figure 4 | The identification of candidate genes from the metabolomic dataset. (A) Changes in molecular features (metabolites) associated with Chr8 
ILs. Each green dot is an IL, red dot a molecular feature (putative metabolite) . The black connecting lines show an increase, the blue a decrease. The cluster 
of features circled represents significant metabolite changes common to both 8-2 and 8-2-1, of which a reduction in y-tocopherol and increase in ot- 
tocopherol was observed. (B) LC-MS profile validating the identity of tocopherols. (C) Quantitative changes ( ± s.e.) in tocopherols (y and a) found in 
IL8-2 and IL8-2- 1 , relative to their M82 comparator. (D) Co-localization of the candidate gene with the change in tocopherol metabolites in IL8-2 and 8- 
2-1, as well as the position of the candidate gene product within the tocopherol biosynthetic pathway. 



Levington M2 compost and irrigation supplemented with Vitax 214. The population 
was planted in four blocks, each containing 1 plant of the 76 5. pennellii ILs, with three 
M82 controls. 

Experimental design, data acquisition, integration and statistical analyses. From 
four plants of each genotype, six ripe fruit {9 days post breaker, dpb) were harvested, 
pooled, freeze-dried and homogenized into a fine powder. Triplicate technical 
replicates for polar (water/methanol 4:1)) and non-polar (chloroform) extracts were 
prepared and subjected to NMR and direct infusion (Di)-MS (in both +ve and — ve 
mode). These approaches were complemented by targeted procedures using GC-MS, 
LC-MS and HPLC-PDA techniques for known phytochemicals 28 ' 33 ' 40 . Quality control 
samples were concurrently analysed. Following acquisition and normalization of the 
data, computational integration of the datasets was performed. A data matrix that 
combined the analytical outputs and facilitated statistical analysis of metabolites/ 
molecular features for each IL, compared to its M82 control, was the objective of the 
data processing (Supplementary Dataset SI). To achieve this goal, the data generated 
were treated in a variety of ways. Firstly, unique chemical shifts were used to identify 
metabolites within the NMR fingerprints 45 . This resulted in 500 variables from the 
NMR spectra. For direct and LC-MS analysis, the m/z signatures were assigned 
nominal masses and ion correlations (including ion adducts). This treatment resulted 
in 852 features in the +ve mode and 948 in — ve mode. Variables generated in +ve 
(POS) and — ve (NEG) ionisation modes were annotated incrementally from 50 to 
999 {e.g. POS_50). These values represented binned intensities of 1 Da, whereby 
POS_108, for example, represents the median of intensities between m/z 107.5 and 
108.5. The NMR data were treated in a similar manner with 900 variables consisting 
of 0.01 8 bins from 0.505 to 9.495 8, annotated for example as NMR_0.505, where 
NMR_3.455 would equal the sum of intensities between 3.450 8 and 3.460 8. The 
amounts of compounds determined by targeted analysis were also entered into the 
data matrix. The molecular weights were used to designate putative formulae, which 
in turn were used to blast against the Plant and MetaCyc databases, generating 



putative metabolite identifications, which led to reaction and pathway assignments. 
Confirmatory targeted GC-MS and LC-PDA gave rise to a further 30 to 60 
metabolites. In this case, retention times and spectral properties compared to 
authentic standards were used to provide unambiguous identifications. Multivariate 
analysis was initially performed to assess the overall variance in the population and 
identify crude changes in chemical composition. The putative annotations are based 
on the premise that if the compound were present in tomato and were to form an ion 
under the ionisation conditions used, then it would be found in the designated MS 
bin. Correlation analyses were performed using Pearson coefficients, between 
putative bin identities and targeted compounds from NMR and HPLC, which showed 
a high level of correlation (see Figure 5; p < 0.05) for sucrose and hexose (NMR v MS), 
rutin, naringenin and chlorogenic acid (HPLC v MS), glutamine, glutamate, 
asparagine and aspartate (NMR v MS and GC/MS v MS). 

The data were reduced using PCA and in order to account for environmental 
effects, they were also normalised to their M82 controls. This was achieved by first 
calculating the mean M82 value for each variable in crop 1 and then crop 2. Then, the 
ratio for each biological replicate (plant) to mean M82 ratio was calculated. Plants 
derived from crop 1 (including the M82 biological replicates) were normalised to the 
mean of the M82 comparator for crop 1 and likewise with crop 2. The mean fold 
changes from M82 for each of the variables found in every IL were then calculated, 
and both ANOVA and Student's t-tests performed to identify variables that were 
statistically significantly different {p < 0.05) from M82. False detection rates (FDR) 
for putative mQTL were calculated using the method of Storey and Tibshirani 46 . The 
data were shown to be normally distributed using the Kolmogorov-Smirnov (K-S) 
test for normal distribution and also visualized using several complementary 
approaches. Heat maps were produced in MatLab v. (http://www.mathworks.co.uk/ 
index.html) and used to show the fold-change for each variable in every IL compared 
to the M82 comparator. An in-house Excel macro was used to plot putative bio- 
chemical pathway changes for each IL, compared to M82. The putative ions for each 
compound in every pathway of the PlantCyc database were listed with their mean 
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Figure 5 | A correlation network constructed from metabolites involved in chloroplast isoprenoid biosynthesis. Pearson coefficients ( r 0.6 to 0.8 with a 
p-value significance < 0.05) derived from the metabolite levels associated with all 76 ILs of the S. pennellii collection. Metabolites associated with the 
Calvin cycle are shown as blue dots, the chloroplast located 2-C-methyl-D-erythritol-4-phosphate (MEP) pathway purple dots, chlorophyll degradation 
green dots, phytyl synthesis khaki coloured, tocopherol red coloured, prenyl lipid biosynthesis yellow, and carotenoid formation orange. Green 
connecting lines represent positive correlations and red connecting lines negative. Putative hubs derived from the number of connections are circled and 
the putative metabolites annotated. Collectively, the networks' strengths varied from 0.4 to 0.8 with the average being 0.45. Abbreviations: GGPP, 
geranylgeranyl diphosphate; IPP, isopentenyl diphosphate; DMAPP, dimethylallyl diphosphate; MEP, methylerythritol pathway. 



fold-change from M2 for all ILs. A graph was then plotted for every pathway 
(Supplementary Dataset S2). Pearson correlation coefficients were calculated between 
the putative compound ions from the mean fold-change from M82 across all ILs 
within each pathway. An in-house Excel macro was used to compile files compatible 
with Cytoscape software (http://www.cytoscape.org/). Correlations that were stat- 
istically significant (p < 0.05, two-tailed test) were plotted in Cytoscape to show both 
positive and negative interactions. The networks for adjoining biochemical pathways 
were then merged to provide an overview of the total network in tomato fruit. 
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